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Abstract — Aiming at accurately and rapidly identifying our heavy metal resistant rhizobial strains, genomic average 
nucleotide identity (ANI) and core genome analyses were performed to investigate the phylogenetic relationships among 45 
strains in the families of Rhizobiaceae and Bradyrhizobiaceae. The results showed that both of the ANI and core-genome 
phylogenetic trees revealed similar relationship. In ANI analysis, the 90%, 75% and 70% ANI values could be the thresholds 
for species, genus and family, respectively. Analyzing the genomes using multi-dimensional scaling and scatter plot showed 
highly consistent with the ANI and core-genome phylogenetic results. With these thresholds, the 45 strains were divided into 
24 genomic species within the genera Agrobacterium, Allorhizobium, Brady rliizobium, Sinorhizobium and a putative novel 
genus represented by Ag. albertimagni AOL15. The ten arsenite -oxidizing and antimonite tolerant strains were identified as 
Ag. radiobacter, and two Sinorhizobium genomic species differing from S. fredii. In addition, the description of 
Pararhizobium is questioned because ANI values greater than 75% were detected between P. giardinii H152T and 
Sinorhizobium strains. Also, reversion of the species definition for several strains in R. etli and R. leguminosarum was 
suggested. Our results demonstrate that analyses of ANI and core-genome are rapid and confident methods to identify the 
rhizobial strains, and it will be also convenient when more genome data are accumulated. 
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I. Introduction 

It is well known that the symbiotic bacteria (rhizobia) and the tumor-inducing phytopathogenic bacteria (agrobacteria) in 
Rhizobiaceae family are phylogenetically intermingled in some genera, even in the same species. Originally, the symbiotic 
bacteria were all grouped within the genus Rhizobium, which was established in 1890 with Rhizobium leguminosarum as the 
type species [1, 2]; and the tumor-inducing phytopathogenic bacteria were designed as the genus Agrobacterium which was 
first proposed by Conn including Agrobacterium tumefaciens (tumor-inducing). Agrobacterium radiobacter (no tumor) and 
Agrobacterium rliizogenes (hairy root) based on their phytopathogenic symptoms [3]. Later, Agrobacterium rubi (from 
Rubiaceae plants). Agrobacterium vitis (from Vitis plants) and Agrobacterium larrymoorei (from Ficus plants) were 
established [4-6], which were divided into Biovars I, II and III [7]. Based upon the phylogeny of 16S rRNA gene, the genus 
Agrobacterium and a later described genus Allorhizobium [8] were officially immerged into Rhizobium [9]. However, this 
combination caused frequently argument because their different affection on plants, and their divergent phylogenetic 
relationships of 16S rRNA, 23S rRNA and recA genes [10-14], as well as the fatty acid profiles [15]. With description of 
more and more symbiotic and non-symbiotic species in the combined genus Rhizobium, its polyphylic feature was further 
apparent. 

Meanwhile, some novel molecular techniques have been developed for estimating the phylogenetic relationships, such as the 
multilocus sequence analysis (MLSA) and whole genome sequencing. Recently, the taxonomy of Agrobacteriiim/Rhizobium 
group was dramatically revised again based upon the MLSA data of four or six protein-coding housekeeping genes [16-17], 
which led the split of Agrobacterium/ Rhizobium group into five sister genera. Agrobacterium, Allorhizobium, Neorhizobium, 
Pararhizobium and Rhizobium. In the recently emended Agrobacterium genus, Ag. radiobacter and Ag. rubi are 
phytopathogenic species, while Ag. nepotum, Ag. pusense, and Ag. skierniewicense were new combinations transferred from 
the former Rhizobium species. The emended Allorhizobium covered the phytopathogenic species Al. vitis (formerly 
Agrobacterium vitis), and the symbiotic or endophytic species Al. taibaishanense, Al. paknamense, Al. oryzae, Al. 
pseudoryzae and Al. borbori. The genus Neorhizobium included the species N. galegae, N. vignae, N. huautlense and N. 
alkalisoli transferred from the former Rhizobium species [16]. Pararhizobium included P. giardinii, P. capsulatum, P. herbae 
and P. sphaerophysae [17], which were all transferred from the former Rhizobium species. After the reversion, the species 
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represented by Rhizobium leguminosarum are maintained in the genus Rhizobium, and the phytopathogenic species R. 
rhizogenes (former Agrobacterium rhizogenes) was also included in this genus. 

Despite the nomenclature change or taxonomic reversion, the pathogenic (for plants and human being), symbiotic, 
endophytic and saprophytic bacterial species are intermingled in the five Agrobacterium/Rhizobium sister genera [16-18]. 
Furthermore, these four living states or characters even can be found in the single species Ag. radiobacter [19] or in the same 
strains of R. rhizogenes [20], Although the recent reversions have resolved the nomenclature argument about the symbiotic 
Rhizobium species and the phytopathogenic Agrobacterium species, the phylogenetic relationships between the symbiotic 
species and phytopathogenetic species were still not sufficiently revealed because only several housekeeping genes have 
been considered [16-17], To obtain an insight view in the phylogenetic relationships among the members of 
Agrobacterium/Rhizobium , the whole genome comparison would be valuable. 

Previously, we isolated some arsenite -oxidizing or antimonite tolerant strains and they were primitively identified as 
unnamed species within Agrobacterium and Sinorhizobium based on the 16S rRNA gene sequence analyses [21-23]. Aiming 
at further identifying them, as well as developing a rapid, confident/stable, high-throughput identification method, we 
performed this study by using the genome data. In particular, the average nucleotide identity (ANI) and core-genome [24] 
were estimated to ascertain the phylogenetic relationships among the 45 strains in the family Rhizobiaceae. The results 
offered accurate identification of our test strains and generated some valuable taxonomic clues. 

II. Material and Method 


2.1 Genomic information 

In total, 45 available genome sequences were used in this study (See Supplementary Table SI for details), in which 34 were 
extracted in January, 2015 from the NCBI GenBank, including 31 Rhizobium-Agrobacterium strains, one Sinorhizobium 
strain, and two Brady rhizobium strains, which were originally isolated from agricultural soils, root nodules, plant tumors, 
heavy metal-contaminated soil, or saline desert soil (Table SI). In addition, 11 genomes covering nine arsenite -oxidizing 
strains of Agrobacterium (6) and Sinorhizobium (3), and an antimonite tolerant Sinorhizobium strain isolated in our previous 
studies [21-23], and a type strain Agrobacterium radiobacter DSM30147 T were sequenced in this study in Shanghai 
Majorbio Bio-Pharm Technology Co., Ltd. The NCBI GenBank accession numbers for the genomic sequences of the 45 
strains are shown in the supplementary Table SI. Genome annotations of these strains were performed through the NCBI 
Prokaryotic Genome Annotation Pipeline (http://www.ncbi.nlm.nih.gov/genome/ annotation_prok). 

2.2 Phylogenetic analysis based on 16S rRNA genes (rrs) 

To determine the phylogenetic relationship among the 45 selected strains, the rrs sequences were either taken from single rrs 
gene in the GenBank or retrieved from the genome sequences. The distance between strains was calculated using the 
neighbor-joining (NJ) method and a phylogenetic tree was reconstructed with the Mega 5.05 software [25]. 

2.3 Phylogenomic analysis based on core-genome sequences 

To assess genome diversity, all the coding sequences (CDSs) of the 45 genomes were merged together and the core -genome 
sequences were searched against themselves based on the BlastP algorithm, with a cutoff of 50% protein identity and 70% of 
the whole sequences [26]. For the phylogenomic analysis, each set of the converged core CDSs was aligned with ClustalW. 
Then, all alignments were cascaded into a string of amino acid sequences, and a NJ tree with 1,000 bootstrap was assembled 
using the Mega 5.05 program [25]. 

2.4 Phylogenomic analysis based on average nucleotide identity (ANI) values 

The ANI values between each pair of genomes among the 45 strains were calculated by the JSpecies software [27] according 
to the instructions. Based on the pairwise ANI values, a lower left matrix was constructed to represent the pairwise distance 
(defined as 100% - ANI) and the matrix was used to assemble an ANI divergence dendogram with the method of neighbor- 
joining (NJ) in the Mega 5.05 program [25]. 

2.5 Multidimensional scaling (MDS) and scatter plot analyses based on pairwise ANI values 

It is widely accepted that high ANI values represent close relationships in taxonomy [27], Using the SPSS program [28] the 
MDS [29] algorithm was applied to place each object in 45 -dimensional spaces and to ensure that the pairwise distances were 
well preserved. Each point was then assigned coordinates in each of the 45 dimensions, and, finally, the perceptual mapping 
was shown in two dimensions. The scatter diagram, which was based on the coordinates calculated by MDS, was constructed 
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using the Excel program. In addition, another scatter diagram was created, which was based on the pairwise average genome 
size versus the pairwise ANI values, using the Excel program. 


Table SI 

General genomic information of the 45 strains used in this study. 


Species 

Isolution source 

Genome size 

GC 

content 

Predicted CDs 

Accession No. 

Level 

Agrobacterium sp. C13* 

Soil 

5.64 

59.8 

5303 

ASYD00000000 

draft 

Agrobacterium sp. D14* 

Arsenic-enriched soil 

5.54 

59.8 

5186 

asxxoooooooo 

draft 

Agrobacterium sp. JL28* 

Antimony mine 

5.65 

59.8 

5326 

asxzoooooooo 

draft 

Agrobacterium sp. LY4* 

Soil 

5.64 

59.8 

5324 

asyaoooooooo 

draft 

Agrobacterium sp. TS43* 

Arsenic-enriched soil 

5.65 

59.8 

5368 

asyboooooooo 

draft 

Agrobacterium sp. TS45* 

Arsenic-enriched soil 

5.64 

59.8 

5310 

asycoooooooo 

draft 

Ag. tumefaciens 5 A 

Arsenic-enriched soil 

5.74 

58.6 

5520 

agvzoooooooo 

draft 

Ag. tumefaciens GW4 

Arsenic polluted soil 

5.64 

59.8 

5131 

AWGVO 1000000 

draft 

Ag. radiobacter DSM 30147 1 

Soil 

7.18 

59.9 

6834 

ASXY00000000 

draft 

Ag. tumefaciens C58 

Cherry tree tumor 

5.67 

59.1 

5355 

GCA_000092025 

complete 

Ag. tumefaciens Cherry 2E-2-2 

Crown gall 

5.43 

59.9 

5045 

APCC00000000 

draft 

Ag. tumefaciens CCNWGS0286 

Zinc-lead mine tailing 

5.21 

59.5 

4985 

agsmoooooooo 

draft 

Ag. tumefaciens F2 

Soil 

5.47 

59.5 

5321 

afsdoooooooo 

draft 

Agrobacterium sp. ATCC 31749 

Soil 

5.46 

59 

5536 

aecloooooooo 

draft 

Agrobacterium sp. HI 3-3 

Rhizosphere 

5.57 

58.5 

5345 

GCA_000 192635 

complete 

Agrobacterium sp. 224MTsu3.1 

Soil 

4.8 

59.7 

4593 

ARQL00000000 

draft 

Ag. albertimagni AOL15 

Arsenite oxidizing 
biofilm 

5.09 

61.2 

4811 

ALJF00000000 

draft 

Allorhizobium vitis S4 

Vitis vinifera nodule 

6.32 

57.5 

5389 

GCA_0000 16285 

complete 

R. etli 8C-3 

Root nodule 

3.47 

61.1 

5076 

ABRA00000000 

draft 

R. etli CFN 42 T 

Phaseolus vulgaris 
nodule 

6.53 

61.1 

5963 

GCA_000092045 

complete 

R. etli Kim 5 

Root nodule 

4.14 

61.1 

5963 

ABQY00000000 

draft 

R.freirei PRF 81 

Bean nodule 

7.08 

59.9 

6271 

AQHN00000000 

draft 

R. grahamii CCGE 502 T 

Root nodule 

7.15 

59.4 

7368 

AEYE00000000 

draft 

R. gallicum bv. gallicum R602 1 

Phaseolus vulgaris 
nodule 

7.22 

60.8 

7152 

GCA_0008 16845 

complete 

R. lupini HPC(L) 

Saline desert soil 

5.27 

59.2 

4615 

AMQQ00000000 

draft 

R. leguminosarum bv. trifolii 
WSM597 

Trifolium polymorphum 
nodule 

7.63 

61 

7159 

AKHZ00000000 

draft 

R. leguminosarum bv. trifolii 
SRDI565 

Trifolium polymorphum 
nodule 

6.91 

60.7 

6870 

aqudoooooooo 

draft 

R. leguminosarum bv. phase oli 
4292 

Bean nodule 

7.35 

60.7 

7255 

AQZR00000000 

draft 

R. leguminosarum bv. viciae TOM 

Legume root nodule 

7.36 

60.8 

7298 

AQUC00000000 

draft 
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R. leguminosarum bv. viciae 
WSM1481 

Legume root nodule 

7.56 

61 

7548 

AQUM00000000 

draft 

R. leguminosarum bv. viciae 3841 

Legume root nodule 

7.75 

60.9 

7131 

GCA_000009265 

complete 

R. phaseoli Ch24-10 

Root nodule 

6.62 

61.3 

6512 

AHJU00000000 

draft 

R. rliizogenes K84 

Plant root soil 

7.27 

59.9 

6285 

GC A_0000 16265 

complete 

R. tropici CIAT 899 T 

Phaseolus vulgaris 
nodule 

6.69 

59.5 

6287 

GCA_000330885 

complete 

Rhizobium sp. 42MFCr.l 

Arabidopsis thaliana 
rhizosphere 

6.21 

59.9 

6332 

ARH V 00000000 

draft 

Rhizobium sp. API 6 

Populus deltoides 
rhizosphere 

6.5 

60.2 

6123 

AJVM00000000 

draft 

Rhizobium sp. CF142 

Populus deltoides 
rhizosphere 

7.46 

60.1 

7229 

AJWE00000000 

draft 

Pararhizobium giardinii H152 1 

Phaseolus vulgaris 
nodule 

6.81 

60.7 

6782 

ARB G00000000 

draft 

S. fredii USDA 205 T 

Soybean nodule 

7.01 

62.3 

6436 

AUTC00000000 

draft 

Sinorhizobium sp. GL2* 

Arsenic polluted soil 

7.05 

62.1 

7586 

AUTB 00000000 

draft 

Sinorhizobium sp. GL28* 

Arsenic polluted soil 

8.45 

61.6 

7431 

AUSZ00000000 

draft 

Sinorhizobium sp. GW3* 

Arsenic polluted soil 

7.36 

62 

7450 

AUSY00000000 

draft 

Sinorhizobium sp.Sb3* 

Coalmine 

6.08 

61.6 

7706 

AUTA00000000 

draft 

Bradyrhizobium diazoefficiens 
USDA 110 

Soybean nodule 

9.11 

64.1 

8373 

GCA_0000 11365 

complete 

Bradyrhizobium japonicum USDA 
6 

Soybean nodule 

9.21 

63.7 

8826 

GCA_000284375 

complete 

*The strains isolated and sequenced in this study. The type strains are in bold. 


III. Results 

3.1 General genomic features of the involved strains 

For the 18 strains previously classified into the genus Agrobacterium, three complete genomes (Ag. tumefaciens C58, 
AgrobacteriumAike sp. H13-3 and Al. vitis S4) and 15 draft genomes (including six obtained in this study) were obtained. 
For the 19 Rhizobium strains five complete genomes (R. etli CFN 42 T , R. leguminosarum hv. viciae 3841, R. tropici CIAT 
899 t , R. rliizogenes K84, and R. gallicum R602sp T ), and 14 draft genomes (including the type strain R. grahamii CCGE 
502 t ) were found. In addition, draft genomes were also obtained for P. giardinii H152 T , five Sinorhizobium strains (included 
the type strain S. fredii USDA205 1 ) and two Brady rhizobium strains B. diazoefficiens USDA110 T and B. japonicum 
USDA6 1 . The GC content range of the 45 strains is 57.5 - 64.1%. The genome sizes vary from 3.47 (R. etli 8C-3) to 9.21 Mb 
(Brady rhizobium japonicum USDA6 T ), whereas the number of predicted CDSs vary from 4593 ( Agrobacterium sp. 
224MTsu3.1) to 8826 (Brady rhizobium japonicum USDA6 1 ). 

3.2 Phylogenetic relationship based on rrs sequences 

A NJ phylogenetic tree based on the rrs genes of the 45 strains (available as Fig. SI) revealed that the strains belonging to 
Agrobacterium tumefaciens were separated into two branches and Allorhizobium vitis S4 was interfused among the Ag. 
tumefaciens strains. In addition, the Ag. radiobacter DSM 30147 T was clustered with Rhizobium sp. PRF 81, R. tropici CIAT 
899 t , Rhizobium sp. AP16 and R. rliizogenes K84 (Fig. SI). The Rhizobium sp. CF142 was clustered in genus 
Agrobacterium, while Rhizobium lupini HPC(L) was grouped into Brady rhizobium (Fig. SI). 
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Agrobacterium sp. C13 
Agrobacterium tumefaciens F2 
Agrobacterium sp. D14 
Agrobacterium sp. JL28 
Agrobacterium sp. LY4 
Agrobacterium sp. TS45 
^-| Agrobacterium tumefaciens GW4 

Agrobacterium tumefaciens str. Cherry 2E-2-2 
Agrobacterium tumefaciens CCNWGS0286 
Agrobacterium sp. 224MFTsu3.1 
Agrobacterium tumefaciens 5A 
Agrobacterium sp. TS43 
Agrobacterium sp. H13-3 

AHorhizobium vitis S4 


sail * 
lool / 


Agrobacterium sp. ATCC 31749 
100 1 Agrobacterium tumefaciens str. C58 
— Agrobacterium albertimagni AOL15 
— Rhizobium sp. CF142 
83 r Sinorhizobium sp. Sb3 
*• Sinorhizobium sp. GL28 

Sinorhizobium sp. USDA205 T 

Sinorhizobium sp. GW3 
97* Sinorhizobium sp. GL2 
Pararhizobium giardinii H152 T 
Rhizobium grahamii CCGE 502 T 
Rhizobium sp. PRF 81 
Rhizobium tropici CIAT 899 T 
— Rhizobium sp. AP16 

Rhizobium rhizogenes K84 
Agrobacterium radiobacter DSM 30147 T 
Rhizobium ef//CFN42 T 
Rhizobium phaseoli Ch24-10 
Rhizobium etli Kim 5 

Rhizobium leguminosarum bv. trifolii WSM597 
Rhizobium gallicum R602 T 
jj Rhizobium leguminosarum bv. phaseoli 4292 
Rhizobium leguminosarum bv. viciae WSM1481 
Rhizobium leguminosarum bv. viciae 3841 
Rhizobium leguminosarum bv. viciae TOM 
Rhizobium leguminosarum bv. trifolii SRDI565 
Rhizobium etli 8C-3 

f Bradyrhizobium diazoefficiens USDA 110 T 
Bradyrhizobium japonicum USDA 6 T 
Rhizobium lupini HPC(L) 
Escherichia coli K12 


0.01 

Fig. SI. A NJ phylogenetic tree based on 16S rRNA gene sequences (rrs). The tree was built for 45 
Rhizobium family strains, which includes six type strains. The average length of these 16S rRNA gene 

SEQUENCES IS 1,389 BP. HORIZONTAL BRANCH LENGTHS ARE PROPORTIONAL TO THE ESTIMATED NUMBER OF 
NUCLEOTIDE SUBSTITUTIONS, AND BOOTSTRAP PROBABILITIES (AS PERCENTAGES) ARE DETERMINED FROM 1000 RE- 
SAMPLINGS. The 16S rRNA gene sequence of Escherichia coli K12 was used as the reference. 

3.3 Phylogenomic relationship based on the core-genome sequences 

Using the cutoff of 50% protein identity and 70% of the whole sequences, 313 core-genome CDSs were identified for the 45 
strains. In the phylogenomic tree based on the core-genome (Fig. 1), the tested strains were grouped into six lineages, 
including 1) Ag. radiobacter! tumefaciens (Biovar I)-/?. lupini HPC(L) lineage; 2) Ag. albertimagni (Biovar III) lineage; 3) 
AHorhizobium vitis (former Ag. vitis ) lineage; 4) Rhizobium lineage covering R. leguminosarum. R. etli. R. phaseoli, R. 
gallicum, R. tropici, R. freirei, R. grahamii and R. rhizogenes (former Ag. rhizogenes)', 5) Pararhizobium giardinii (former R. 
giardinii) and Sinorhizobium lineage; and 6) Bradyrhizobium lineage. 
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FIG. 2. A NJ PHYLOGENOMIC TREE BASED ON THE AVERAGE NUCLEOTIDE IDENTITY (ANI) FOR THE TESTED STRAINS IN 

this study (Table S2). All of the strains could be clearly divided into Agrobacterium, Rhizobium, 
Allorhizobium, Pararhizobium, Sinorhizobium and Bradyrhizobium branch. The strains R. lupini HPC(L) was 

LABELED BY RED STARS, SINCE THEY CLUSTERED WITH AGROBACTERIUM GROUP. 

3.4 Phylogenomic relationship based on ANI values 


The ANI values between each pair of genomes were calculated and 990 ANI values were obtained for the 45 strains (Table 
S2). In the NJ phylogenomic tree constructed with the ANI data, the 45 strains were also divided into six lineages (Fig. 2), 
same as the lineages defined with the core-genome (Fig. 1). The members in distinct families, Rhizobiaceae and 
Bradyrhizobiacea, showed 66.00-68.01 % ANI and the strains within Rhizobiaceae presented ANI >70.54. The ANI values 
were lower than 75% among different genera in family Rhizobiaceae, except Pararhizobium that presented 75.16-76.22% 
ANI with the Sinorhizobium strains (Table S2). At 90% ANI value, all the type strains for the defined species in the genus 
Rhizobium were separated and the 45 strains could be delineated into 24 genomic species (Fig. 2, also Table S2). 1) Among 
the 17 strains belonging to Agrobacterium , 11 were identified as Ag. radiobacter , including all the six tested arsinite- 
oxidizing strains; while 5 strains and R. lupini HPC(L) represented six distinct Agrobacterium genomic species (ANI < 90% 
with the other Agrobacterium strains); and the last strain Ag. albertimagni AOL15 was a very divergent lineage sharing ANI 
of 72.42-73.18% with the other Agrobacterium strains. 2) For the 18 Rhizobium strains (except the R. lupini strain), R. 
phaseoli Ch24-10, R. etli 8C-3 and R. etli Kim5 formed a genomic species; the six R. leguminosarum strains and R. gallicum 
R602 t formed another genomic species; R. rhizogenes K84 and Rhizobium sp. AP16 represented the third genomic species; 
while the other six strains were single lineages corresponding to R. etli , R. t rapid, R. freirei , R. grahamii and 2 unnamed 
species. 3) For the genus Sinorhizobium, strains GL28 and Sb3 form the sp. I; while GW3 and GL2 formed sp. II; both were 
different from the type strain S. fredii USDA 205 T . 4) Pararhizobium giardinii H152 T was grouped in Sinorhizobium as the 
most divergent lineage (ANI > 75% with the Sinorhizobium strains). 5) The two Bradyrhizobium strains were two lineages 
corresponding to B. japonicum and B. diazoefficiens, respectively. 6) The remaining genospecies were Allorhizobium vitis S4 
(Figs. 1 and 2). 
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3.5 Similarity levels using MDS and scatter plot analyses based on pairwise ANI values 

In the MDS scatter diagram (Fig. 3), the 45 genomes (represented by 45 spots) were clearly separated into five groups. 1) 
Eighteen strains within the Rhizobium formed a group located on the upper right side (except R. lupini)', 2) 16 strains within 
Agrobacterium group (except Ag. albertimagni AOL15) and Rhizobium lupini HPC(L) are located on the upper left side of 
the vertical axis; 3) five strains of Sinorhizobium group together with P. gicirdinii are distributed near the vertical axis; Ag. 
albertimagni is near them; 4) two Brady rhizobium strains are a group located on the bottom right side of the vertical axis 
(Fig. 3); 5) Al. vitis S4 occupied a unique position differed from all the other groups (Fig. 3). 



Fig. 3. The multidimensional scaling (MDS) analysis based on the pairwise ANI values. Each point 

REPRESENTS A SINGLE STRAIN, AND THE DISTANCE BETWEEN TWO POINTS REPRESENTS THE RELATIVE GENETIC 
DISTANCE BETWEEN THE TWO STRAINS. THE STRAINS ARE DIVIDED INTO SIX GROUPS, AGROBACTERIUM, RHIZOBIUM, 

Allorhizobivm, Pararhizobium, Sinorhizobium and Bradyrhizobium, which are indicated by blue 
{Rhizobium), red {Agrobacterium), grey {Allorhizobium), yellow {Pararhizobium), green {Sinorhizobium) 
and pink {Bradyrhizobium), respectively. The number of strains in each group was labeled. 

To further determine the similarity level of strains within each genus, another scatter plot analysis of the 45 strains was 
performed based on the 990 pairwise ANI values (Fig. 4A). Since only one strain belonged to each of the genera 
Allorhizobium and Pararhizobium, the similarity cannot be compared in this test. Meanwhile, the strains within the genus 
Rhizobium possess a wide range of ANI values (approximately 72-98%), which indicated the diverse genetic distance among 
the strains within this genus (Fig. 4A). In contrast, the strains belonged to Agrobacterium, Sinorhizobium and 
Bradyrhizobium showed relatively narrow range of ANI value (86-100% for Agrobacterium', 78-98% for Sinorhizobium', and 
89% for Bradyrhizobium) (Fig. 4A). The strains within Bradyrhizobium shared lowest ANI similarity with the strains 
belonging to the other genera (approximately 67%, Fig. 4, yellow); and most of the strains within Rhizobium, Agrobacterium, 
and Sinorhizobium groups shared 71-75% ANI similarities with each other (Fig. 4A), except that the ANI similarities 
between R. lupini HPC(L) and the Agrobacterium strains were higher (~ 85-88%, Fig. 4A) than those with the strains in other 
genera (-71-75%, Fig. 4A). Moreover, without R. lupini HPC(L), the Rhizobium strains showed relatively narrow range of 
ANI value (76-98%, Fig. 4B), which indicating that R. lupini HPC(L) may be more appropriate to be re-classified into 
Agrobacterium. 
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Fig. 4. Plotted results (990 points) of pairwise average genome size versus pairwise ANI values. Each 
POINT REPRESENTS THE PAIRWISE ANI VALUES OF TWO STRAINS. (A) THE BLUE (171 POINTS), RED (153 POINTS), GREEN 
(10 POINTS), PINK (1 POINTS) AND YELLOW (636 POINTS) PLOTS INDICATE THE PAIRWISE AVERAGE GENOME SIZE VERSUS 
PAIRWISE ANI VALUES OF THE STRAINS WITHIN RHIZOBIVM, AGROBACTERIUM, SlNORHIZOBlUM AND BRADYRHIZOBIUM, 
AND AMONG THE FOUR GROUPS, RESPECTIVELY. SINCE THERE IS ONLY ONE STRAIN BELONGED TO ALLORHIZOBIUM OR 
PARARHIZOBWM, THE PAIRWISE AVERAGE GENOME SIZE VERSUS PAIRWISE ANI VALUES OF THESE TWO STRAINS CANNOT 
BE COMPARED. (B) WITHOUT R. LUP1N1 HPC(L), ONLY 153 BLUE POINTS REPRESENTED THE COMPARISON OF 18 

Rhizobium STRAINS WERE SHOWED. 

3.6 The core genes among the studied strains in genera level 

To further understand the similar and different genetic characteristics of the tested strains, the core genes in Agrobacteirum, 
Rhizobium, Sinorhizobium and Brady rhizobium have been identified. Allorhizobium and Pararhizobium were excluded since 
only one strain was used in each of these two genera. The 18 strains previously classified to Agrobacterium shared 891 core 
genes, while the 20 strains within Rhizobium genus had 768 core genes. When the strain R. lupini HPC(L) was moved into 
Agrobacterium group, Al. vitis was separated from Agrobacterium, and P. giardinii H152 T was separated from Rhizobium, 
the core genes of Agrobacterium and Rhizobium groups would be 1,065 and 977 , respectively, further proving the rationality 
of their reclassification. 


IV. Discussion 

The taxonomy and nomenclature of genera in Rhizobiaceae have been changed dramatically in the last two decades 
associated with the development of taxonomic methods, especially the application of distinct molecular methods. Currently, 
Agrobacterium, Allorhizobium, Ensifer ( Sinorhizobium ), Neorhizobium, Pararhizobium and Rhizobium are described or 
emended based upon the phylogenetic relationships of 16S rRNA gene and multilocus sequencing analysis [16-17]. All these 
genera contained the symbiotic nitrogen-fixing and the tumor-inducing phytopathogenic bacteria [19-20], as well as 
saprophytic and endophytic bacteria [30]. Meanwhile, the genome sequencing data have been considered in description of 
novel genus and species in the family Rhizobiaceae, such as Rhizobium lends and sister species [31] and Pseudorhizobium 
pelagicum [32], These studies demonstrated that the genome analyses are valuable for the classification of Rhizobium- 
Agrobacterium related bacteria. 

In the present study, the ten arsenite -oxidizing or antimonite tolerant strains were identified by comparing their genome 
sequences with other 35 related genome sequences available in the database. Our phylogenomic analyses of both the core- 
genome and the ANI supported the definition of Agrobacterium, Allorhizobium, Sinorhizobium (Ensifer), and Rhizobium 
(Figs. 1 and 2), and these groups were also supported by the MDS analysis and scatter plot based on pairwise ANI values 
(Figs. 3 and 4). These results demonstrated the analyses of ANI and core-genome are both convenient and confident 
taxonomy methods. From our data, the following threshold values could be drawn: 1) 70% for family (66.00-68.01 % 
between Bradyrhizobiaceae and Rhizobiaceae, >70.54 among the strains within Rhizobiaceae); 2) 75% for genus, which fits 
for definition of Agrobacterium, Allorhizobium, Sinorhizobium and Rhizobium', 3) 90% for species according to the 
differentiation of R. etli, R. leguminosarum, R. rhizogenes, R. tropici, R. freirei and the two species of Brady rhizobium. 
Applying these threshold values, all the six arseni te -oxidizing Agrobacterium strains (C13, D14, JL28, LY4, TS43 and TS45) 
could be identified as Ag. radiobacter since they shared ANI >96.8 % with each other and >94.50 % with the type strain. As 
to the three antimonite-oxidizing and one antimonite tolerant Sinorhizobium strains, GL2 and GW3 could be identified as 
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Sinorhizobium sp. I, while GL28 and Sb3 as Sinorhizobium sp. II, both showed ANI values >78.21 % with S. fredii USDA 
205 t . The exact taxonomic affiliation of the four Sinorhizobium strains can be further determined by comparing with other 
defined species in the genus. 

In addition to the identification of our test strains, several taxonomic clues are worthy to discuss. 1) Except of the 11 strains 
of Ag. radiobacter , the sharing of ANI between 84.99% and 88.72% of the other five Agrobacterium strains and R. lupini 
HPC(L) with the Ag. radiobacter strains indicated that they might represent sister species of Ag. radiobacter , which were 
previously termed as Agrobacterium sensu stricto [33]. Rhizobium lupini HPC(L) is apparently a misnamed strain since it 
showed closer relationships with R. etli and Rhizobium leguminosarum in 16S rRNA analysis [34], and it should be 
reclassified as a member of Ag. radiobacter based on the analyses of ANI and core-genome. This change does not affect the 
nomenclature of the species, since the type strain of R. lupini USDA3051 1 has been reclassified as Bradyrliizobium lupini 
based on the comparison of 16S rRNA, recA and glnll genes [35]. 2) The strain Ag. albertimagni AOL15, for whom the 
genus was reported as quite uncertain [36], seemed representing an independent genus based upon its ANI <74.29 % with the 
other strains involved in the study. 3) The strain P. giardinii H152 T seemed belonging to the genus Sinorhizobium 
(ANI>75. 16-76.22 %); therefore, the description of Pararhizobium based upon the MLSA results [17, 33] is questionable. 4) 
The classification of R. phaseoli Ch24-10, R. etli 8C-3 and R. etli Kim 5 should be re-examined since they formed a 
genospecies differed from the type strain of R. etli. 5) The species definition of the six R. leguminosarum strains should be 
revised since they presented ANI values greater than 90% with the type strain R. gallicum R602 T . 6) Rhizobium sp. API 6 
could be identified as R. rhizogenes. All of these observations were supported by the core-genome analysis (Fig. 1), ANI tree 
(Fig. 2), ANI values (Table S2) and the MDS and scatter plot analyses (Figs. 3 and 4). In addition, the core genes number 
was increased when calculated without R. lupini HPC(F) and Ag. albertimagni AOF15, respectively (Fig. 5), which is also 
consistent with the analyses of ANI and core-genome. 



Fig. 5. The comparison of core genes among Agrobacterium, Rhizobium, Sinorhizobium, and Bradyrhizobium 
genera. The number of the core genes in Agrobacterium, Rhizobium, Sinorhizobium, and Bradyrhizobium 
WERE 891, 768, 3,545 AND 6,280, RESPECTIVELY (MARKED AS ORIGINAL). HOWEVER, IF THE R. LUPINI HPC(L) WAS 
CLUSTERED INTO AGROBACTERIUM GROUP, AND ALLORHIZOBIUM VITIS S4 AND PARARHIZOBIUM GIARDINII H152 T WERE 
CLASSIFIED INTO THE NEW GENUS (MARKED AS RE-CLASSIFICATION), THE CORE GENES IN AGROBACTERIUM AND 

Rhizobium groups would change to 1,065 and 977, respectively. 

A considerable advantage of the ANI and core-genome over the MLSA or single gene analyses (16S rRNA or recA) for 
species identification is its stability and ease of access to information worldwide. In this study, we gathered genomic 
information for the 45 strains and constructed a mini-database of 990 pairwise ANI values (Table S2). This mini-database 
can provide a first-step ANI resource, which allows users to finish a genome-based ANI identification of the strains within 
the family Rhizobiaceae rapidly. In addition, the analysis of core-genome compared hundreds of common genes included 
housekeeping genes, such asl6S rRNA gene and recA , which make the comparison more convincible. So far, sequencing 
bacterial genomes is cost-efficient, and good quality draft genomes are good enough for ANI or core-genome comparisons. 
Thus, the ANI and core-genome methodologies provide power tools for phylogenomic studies. 

V. Conclusion 

Conclusively, we propose the analyses of ANI and core-genome as convenient methods to estimate the phylogenetic 
relationship for the rhizobia-related strains, following the thresholds of 90%, 75% and 70% ANI values for species, genus 
and family, respectively. With these thresholds, we identified the ten arsenite -oxidizing and antimonite -tolerant strains as Ag. 
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radiobacter and two Sinorhizobium genomic species differing from S. fredii. In addition, the description of Pararhizobium is 
questioned because ANI values greater than 75% were detected between P. giardinii H152 T and Sinorhizobium strains. Also, 
reversion of the species definition for several strains in R. etli and R. leguminosarum was suggested. Our results demonstrate 
that analyses of ANI and core-genome are powerful supplemented methods to taxonomic identification of bacterial strains. 
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